Scalable Subspace Snooping

نویسندگان

  • Jaehyuk Huh
  • Doug Burger
چکیده

Snooping tag bandwidth is one of the resources that limits the number of processors that can participate in a cache-coherent snooping system. In this paper, we evaluate a type of coherence protocol called subspace snooping, which decouples the snoop tag bandwidth from the address bus bandwidth. In subspace snooping, each processor snoops a set of logical channels, which are a subset of the total snoopable address busses in the system. Thus, each processor snoops a subset of the address space, reducing the number of tag matches required for a system of a given size. By dynamically assigning both processors and cache lines to channels, we support dynamic formation of subspaces, with the goal of having only sets of processors that share data snooping on each given channel. Subspace snooping aligns best with systems for which the address bus bandwidth greatly exceeds the snooping tag bandwidth. Snooping optical interconnects exhibit such characteristics, providing enormous transmission bandwidth, but which quickly become limited by snooping tag energy and bandwidth as the number of processors increases. Optical busses can be subdivided into logical channels using either wave-division or time-division multiplexing, making them good candidates for a subspace snooping implementation. We evaluate a range of subspace snooping protocols on six parallel scientific benchmarks, running on an execution-driven simulator. We show an average 50% reduction in total snoops for 64-processor systems with 8-32 channels, and that for regular applications, subspace snooping shows large performance improvements the ratio of processor bandwidth to snoop bandwidth grows large. This material is based upon work supported by the Defense Advanced Research Project Agency (DARPA) under Contract NBCH30390004.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SYMNET: an optical interconnection network for scalable high-performance symmetric multiprocessors.

We address the primary limitation of the bandwidth to satisfy the demands for address transactions in future cache-coherent symmetric multiprocessors (SMPs). It is widely known that the bus speed and the coherence overhead limit the snoop/address bandwidth needed to broadcast address transactions to all processors. As a solution, we propose a scalable address subnetwork called symmetric multipr...

متن کامل

A scalable anonymous protocol for heterogeneous wireless ad hoc networks

Ensuring anonymity in wireless and hoc networks is a major security goal. Using traffic analysis, the attacker can compromise the network functionality by correlating data flow patterns to event locations/active areas. In this paper we present a novel Scalable Anonymous Protocol that hides the location of nodes and obscure the correlation between event zones and data flow from snooping adversar...

متن کامل

Scalable Parallel Domain Decomposition Methods for Numerical Simulation of PDEs

This paper is concerned about scalable parallel domain decomposition methods for numerical simulation of PDEs. First, one level and two level scalable parallel domain decomposition methods which can be used to solve different equations, are introduced in detail, and then we explain Krylov subspace accelerator technique used to improve the convergence of the methods. Last, the results of some nu...

متن کامل

Scalable Solution for Approximate Nearest Subspace Search

Finding the nearest subspace is a fundamental problem and influential to many applications. In particular, a scalable solution that is fast and accurate for a large problem has a great impact. The existing methods for the problem are, however, useless in a large-scale problem with a large number of subspaces and high dimensionality of the feature space. A cause is that they are designed based o...

متن کامل

Non-Intrusive Deep Tracing of SCI Interconnect Traffic

The Scalable Coherent Interface (SCI) is one of the enabling interconnect technologies for high performance computing on PC Clusters. Trinity College Dublin has designed and is currently prototyping a trace instrument that allows deep traces of SCI interconnect traffic. Such an instrument is essential for a detailed spatial and temporal analysis of parallel executed algorithms on loosely couple...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006